English-French Verb Phrase Alignment in Europarl for Tense Translation Modeling
نویسندگان
چکیده
This paper presents a method for verb phrase (VP) alignment in an English/French parallel corpus and its use for improving statistical machine translation (SMT) of verb tenses. The method starts from automatic word alignment performed with GIZA++, and relies on a POS tagger and a parser, in combination with several heuristics, in order to identify non-contiguous components of VPs, and to label the aligned VPs with their tense and voice on each side. This procedure is applied to the Europarl corpus, leading to the creation of a smaller, high-precision parallel corpus with about 320 000 pairs of finite VPs, which is made publicly available. This resource is used to train a tense predictor for translation from English into French, based on a large number of surface features. Three MT systems are compared: (1) a baseline phrase-based SMT; (2) a tense-aware SMT system using the above predictions within a factored translation model; and (3) a system using oracle predictions from the aligned VPs. For several tenses, such as the French imparfait, the tense-aware SMT system improves significantly over the baseline and is closer to the oracle system.
منابع مشابه
A Framework for Managing Verb Phrase Effective and Easy English-Hindi Machine Translation
Automatic Machine Translations from one to another language have been the subject of great attention of computational linguistics for many years. In EnglishHindi Machine Translation, verb tuning is a vital operation. Present paper is an approach to describe easy English-Hindi verb phrase mapping. This work results satisfactory in Machine Translation over type of English sentences. It is observe...
متن کاملSyntax Augmented Machine Translation via Chart Parsing with Integrated Language Modeling
We present a hierarchical phrase-based translation model which annotates and generalizes existing phrase translations with syntactic categories derived from parsing the target side of a parallel corpus. We associate target parse trees for each training sentence pair with a search lattice constructed from the existing phrase translations on the corresponding source sentence, and consider techniq...
متن کاملSyntax Augmented Machine Translation via Chart Parsing with Integrated Language Modeling
We present a hierarchical phrase-based translation model which annotates and generalizes existing phrase translations with syntactic categories derived from parsing the target side of a parallel corpus. We associate target parse trees for each training sentence pair with a search lattice constructed from the existing phrase translations on the corresponding source sentence, and consider techniq...
متن کاملModeling verbal inflection for English to German SMT
German verbal inflection is frequently wrong in standard statistical machine translation approaches. German verbs agree with subjects in person and number, and they bear information about mood and tense. For subject–verb agreement, we parse German MT output to identify subject–verb pairs and ensure that the verb agrees with the subject. We show that this approach improves subject-verb agreement...
متن کاملCross-linguistic annotation of narrativity for English/French verb tense disambiguation
This paper presents manual and automatic annotation experiments for a pragmatic verb tense feature (narrativity) in English/French parallel corpora. The feature is considered to play an important role for translating English Simple Past tense into French, where three different tenses are available. Whether the French Passé Composé, Passé Simple or Imparfait should be used is highly dependent on...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014